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we have r = 3. 

Since 1 is not a zero of the polynomial g{x), it is clear 
r < ^££M£) Actually, most of the cyclotomic sets are of the 
size t, r is roughly When t is a prime number, we 

know each cyclotomic coset except {1} is of the size i(see 
[11]). Thus r — '^'^3(g(x) ^j^gjj i \^ ^ prime(see Example 2 
and 3 below). 



Abstract — BCH (Bose-Chaudhuri-Hocquenghen) error 
correcting codes ([l]-[2]) are now widely used in communication 
systems and digital technology. Direct LFSR(linear feedback 
shifted register)-based encoding of a long BCH code suffers 
from serial-in and serial-out limitation and large fanout effect of 
some XOR gates. This makes the LFSR-based encoders of long 
BCH codes cannot keep up with the data transmission speed 
in some applications. Several parallel long parallel encoders for 
long cyclic codes have been proposed in [3]-[8]. The technique 
for eliminating the large fanout effect by J-unfolding method 
and some algebraic manipulation was presented in [7] and 
[8] . In this paper we propose a CRT(Chinese Remainder 
Theorem) -based parallel architecture for long BCH encoding. 
Our novel technique can be used to eliminate the fanout 
bottleneck. The only restriction on the speed of long BCH 
encoding of our CRT-based architecture is log-^N, where is 
the length of the BCH code. 

Index Terms — Systematic BCH encoding, CRT(Chinese Re- 
mainder Theorem), fanout, LFSR(linear feedback shifted regis- 
ter), parallel processing 

I. Introduction and Preliminaries 

BCH codes were introduced in [1-2] and have been 
extensively studied . Let GF{2*-) be a finite field of 2* 
elements and a e GF{2*)* = GF{2*) - {0} be a primitive 
element, (cq, cat-i) S GF{2)^, where, iV = 2* - 1, is 
a codeword of the BCH code C{5) of designed distance S, 
if S^^o^Qa^* = for j = 1,2,...,6 - 1. It is well-known 
that the minimum Hamming distance of C{6) is at least S. 
For any polynomial in GF{2)[x], it can be factorized to the 
product of some irreducible polynomials in G'F(2)[a;](see 
[II]). Let wi{x),...,Wr{x) G GF{2)[x] be the distinct 
monic irreducible polynomials whose zeros are of the form 
^ G GF{2^), where d is arbitrary non negative integer . 
We know that the generator polynomial g{x) is the product 
g{x) = wi{x) ■ ■ ■ Wr{x). It is clear deg{wi{x)) < t for 
i = l,...,r (see [10]). 

Example l(see [10]). Let C C GF{2y^ be the 
[15, 5, > 7] BCH code with zeros a\...,a^ G GF{16), 
where a is a primitive element of (16). Its generator 
polynomial g{x) = x^'^ + x^ + x^ + x"^ + x"^ + x + 1 = 
(x** + x+){x-^ + x^ + x'^ + X + l)(x^ + a; -I- 1). 

For a BCH code of length 2* — 1 and dimension 
2* — 1 — deg{g{x)), the number r is determined by the 
cyclotomic coset decomposition of the zero set {a^, a'^"^}. 

Example 1 (continued, see [10]). We have 3 cyclotomic 

coset {a\a2,a4,a**,}, {a^ ,a^2,a^}, {a^,a^^]. Thus 



The systemic encoding of a cyclic code with 
generator polynomial g{x) (deg{g{x)) — n ~ k) 
and code length n is processed as follows. For a 
fc-bit message m = (TOfc_i, toq) G GF{2)^, set 
m{x) — nik^ix^^^ + ■ • • + rriix + toq G GF{2)[x\, then 
the encoded codeword is c = (c„_i, cq) G GF{2)^ 
such that c{x) — c„_ix"^^ + • • • + cix + cq — 
rn(a;)x"^'^ + Remg(^^-^{m{x)x"~''), where Remg(^^-j{f{x)) 
is the remainder polynomial dividing f{x) by 
g{x), that is, f{x) ^ q{x)g{x) + Rem,g^^){f{x)), 
deg{Remg(^){f{x))) < deg{g{x)). 

For multiplying the input polynomial u{x) G GF{2)[x] 
by a polynomial h{x) G GF{2)[x], we have a LFSR 
circuit to implement the multiplication with at most 
nz{h) < deg{h{x)) + 1 XOR gates, where nz{h) is the 
number of non zero coefficients in h{x). For dividing the 
input polynomial u{x) G GF{2)[x] by the polynomial 
h{x) G GF{2)[x], we have a LFSR circuit with at most 
nz{h) XOR gates, which outputs the remainder polynomial 
of the division (see [10-11]). 

Long BCH codes can sometimes achieve better performance 
than RS(Reed-Solomon) codes, which is now widely used 
in digital video broadcasting, optical communication and 
magnetic recording systems. Hence BCH codes are of 
great interest. Long BCH encoding and decoding can be 
implemented directly by linear feedback shifted register(see 
[3] and [10]). However this LFSR-based architecture suffers 
from serial-in and serial-out limitation and large fanout 
effect. The LFSR-based systemic encoding of a long BCH 
code is actually a division circuit with the divisor g{x) (see 
[10]) and the large fanout of some XOR gate would lead to 
large gate delay. In high-speed applications such as optical 
communication systems and digital video broadcasting, such 
LFSR-based long BCH encoding cannot keep up with the 
data transmission speed. Thus faster parallel processing of 
long BCH encoding is needed. 

Several parallel encoding architectures for long cyclic 
codes have been proposed in [4-6]. In [7] and [8], K. K. 
Parhi et al presented the technique of parallel architecture of 
long BCH encoding based on J-unfolding method( see [9]), 
which can eliminate the large fanout effect. 

In this paper we give a parallel architecture of long BCH 
encoding which is based on Chinese Remainder Theorem 
(CRT, see [11]). The basic idea is the transformation of the 
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above long division LFSR circuit of the generator polynomial 
g{x) by several short division LFSR circuits of low degree 
polynomials wi{x), ...,Wr{x) in parallel. In this process, we 
need some multiplication LFSR circuits which have no large 
fanout. The advantage of our novel parallel architecture is the 
only limitation on number of the fanout of the CRT-based 
long BCH encoding is log2N , where N the code length of 
the BCH code. 

II. CRT-based ParaUel Architecture of Long BCH 
Encoding 

We need to recall Chinese Remainder Theorem. Let 

f{x),g{x) 6 GF{2)[x] be two polynomials. Suppose 
g{x) = gi{x) ■ ■ ■ gr{x), where gi{x), gr{x) are pairwise 
co-prime, that is, gcd{gi{x), gj{x)) = 1 for any two 
distinct i and j. Let g'^{x) = € GFi2)[x] for 

i = 1, ...,r. It is clear deg{g[{x)) = deg{g{x)) - deg{gi{x)) 
and gcd{g[{x),gi{x)) = 1. By using generalized Euclid 
algorithm we can find a polynomial g'/ix) such that 
deg{gi{x)) < deg{gi{x)) and gi{x)gi{x) = 1 mod gi{x) 
(i.e. g'l{x)g[{x) — 1 can be divided by gi{x)). We have the 
following result. 

CRT (see [11]). Remg(^){f{x)) 
llUi9'i{x)Rem,,^^^{g'l{x)f{x)). 

Let g{x) = wi{x) ■ ■ -Wrix) be the generator polynomial 
of a BCH code, where wi{x), ...,Wr{x) are the distinct 
irreducible polynomials in GF{2)[x] as in the previous 
section. From the theory of the finite field(see |11]), 
deg{wi{x)) < t, a fixed constant around log2N, where 
A?^ = 2* - 1 is the code length of the BCH code. It is 
clear that these polynomials are pairwise co-prime. Set 
w'^{x) = Let Ui{x) e GF{2)[x] be the unique 

polynomial such that u,(x)w-(a;) = 1 mod Wi{x). 

From CRT J?emg(^)(m(a;)a;"-'=) = 
^i=iw'i{x)Rem^.(^x^{ui{x)m{x)x^~''), we can have a 
parallel architecture for getting Remg(^x){^{x)^"~'') 
immediately. First we have r parallel LFSR circuits 
multiplying ui{x), ...,Ur{x), then r parallel LFSR circuits 
dividing wi{x), ...,Wr{x); r parallel LFSR circuits multiplying 
w[{x), ...,'w'^{x) in the third step and finally a circuits 
summing the outputs from the previous circuits. 

Here the fanout effect of the LFSR circuits dividing by 
wi{x), ...,'Wr{x) is upper bounded by t, which is around 
logN. It is well known that the multiplying and summing 
LFSR circuits have no large fanout effect and can be execute 
with small latency. Comparing with the direct LFSR-based 
architecture , though the number of clock cycles is perhaps 
increased in our architecture, the clock period is substantially 
decreased by eliminating the large fanout effect. Thus our 
parallel architecture of getting Remg(^x-){m{x)x"~'') (the 
systemic encoding of the BCH code) is suitable in the high 
speed applications. The speed of this CRT-based parallel 



architecture of long BCH encoding is essentially dependent 
on the number t, which is around the log2N, where A'^ is the 
code length of the BCH code. 

lU. Implementation and Further Comments 

In this section the implementation and the cost of the 
CRT-based architecture of long BCH encoding are given. 

Implementation: 

Step 1. Multiphcation LFSR of polynomials ui,...,Ur 
with the input polynomial m{x)x'^~^ . Here the circuits need 
Y,{deg{ui) + 1) XOR gates. 

Step 2. Division LFSR of polynomials wi....,Wr with 
the inputs of outputs of the circuits in the Step 1. Here the 
circuits need T,{deg{wi) + 1) XOR gates. 

Step 3. Multiphcation LFSR of polynomials w'l, ...,10'^ 
with the inputs of the outputs of the circuits in the Step 2. 
Here the circuit need Y,{deg{g) — deg{wi) + 1) XOR gates. 

Step 4. The sunmiation LFSR of the r outputs in Step 3. 
Here the circuit need at most r{t + 1) XOR gates. 

We can get a upper bound on the number of 

XOR gates used directly, it is upper bounded by 
T,{deg{ui) + 1 + deg{wi) + 1 + deg{w[) + 1) + r{t + 1) < 
2r{t -|- 1) -h r{deg{g) -\- 2). From the estimation of r, this 
number is roughly 2deg{g) + ^^^{deg{g) + 2). 

Example 2 (see [7]). We consider the BCH code with code 

length 2047 = 2^^ — 1 and dimension 1926. Its generator 
polynomial is a degree 121 polynomial g{x) G GF{2)[x], 
which is the product of 11 distinct irreducible polynomials 
wi, ...,wii of degree 11. Thus our architecture need at most 
1595 XOR gates. The number of fanout is upper bounded 
by the deg{wi), ...,deg{wn), which is at most 11. In some 
sense this is better then the architecture in [7]. 

Example 3 (see [8]). We consider the BCH code with 

code length TV = 2^^ — 1 and dimension 7684. Its generator 
polynomial is a degree 507 polynomial g{x) in GF{2)[x]. 
g{x) is the product of 39 degree 13 distinct irreducible 
polynomials in GF{2)[x\. Thus our architecture need at most 
20865 XOR gates. The number of fanout XOR gates in the 
architecture is at most 13. In some aspect this is better than 
the architecture in [8]. 

It is clear the idea can be used for systematic encoding 
for any long cyclic code with generator polynomial 
g{x) = gi{x) ■ ■ ■ gr{x), where 51, ...,gr are pairwise co-prime 
polynomials. Secondly in some cases, if we can choose the 
generator polynomial with the same code parameters, it is 
better to use the generator polynomial g{x) G GF(2)[x\ 
with the property that the numbers of nonzero coefficients in 
gi,...,gr are as small as possible. However the idea of CRT- 
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In this paper we have presented a CRT-based high speed 
parallel architecture for long BCH encoding. The architecture 
can be used to eUminate the large fanout effect. The only 
limitation of this CRT-based parallel architecture is the 
logarithm of the code length of the BCH code. It should be 
noted that our architecture of using CRT for transforming the 
long division LFSR of polynomial g{x) = gi{x) ■ ■ ■ gr{x), 
where gi,...,gr are pairwise co-prime, to short division LFSR 
in parallel can be used for systematic encoding of any long 
cyclic code generated by ^(a;) G GF{2)[x]. 
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based architecture can not be used for the encoding of long 
CRC codes (see [4-6]) because the generator polynomials of 
CRC codes are irreducible. 

IV. Conclusion 



